Golden Rule of Morphology and Variants of Wordforms
نویسنده
چکیده
In many languages, some words can be written in several ways. We call them variants. Values of all their morphological categories are identical, which leads to an identical morphological tag. Together with the identical lemma, we have two or more wordforms with the same morphological description. This ambiguity may cause problems in various NLP applications. There are two types of variants – those affecting the whole paradigm (global variants) and those affecting only wordforms sharing some combinations of morphological values (inflectional variants). In the paper, we propose means how to tag all wordforms, including their variants, unambiguously. We call this requirement ”Golden rule of morphology”. The paper deals mainly with Czech, but the ideas can be applied to other languages as well.
منابع مشابه
Reflections on the Application of the Golden Rule of Ethics in the Holy Quran
The golden rule of ethics has various formations and one of the most common ones says: "whatever you like for yourself like it for others too and whatever you do not like for yourself do not like it for others, too." The golden rule has been discussed in most of the religions including Abrahamic Religions. There are numerous narrations with the concept of golden rule but in the Holy Quran, it h...
متن کاملPersia and the Golden Rule
My paper has two parts. First, I talk about the golden rule. After introducing the rule and its global importance, I explain why many scholars dismiss it as a vague proverb that leads to absurdities when we try to formulate it clearly. I defend the golden rule against such objections. Second, I talk about the golden rule in Persia and Islam; I consider Persian sources (Muslim and non-Muslim) an...
متن کاملEfficient Stochastic Part-of-Speech Tagging for Hungarian
Many of the methods developed for Western European languages and used widespread to produce annotated language resources cannot readily be applied to Central and Eastern European languages, due to the large number of novel phenomena exhibited in the syntax and morphology of these languages, which these methods have to handle but have not been designed to cope with. The process of morphological ...
متن کاملThe Golden Principle of Ethics and Legal Prevention of Illegal Use of Databases
Background: Collecting data in the form of databases is one of the new methods that has been growing with the advancement of information technology. The use of databases in management and policy-making has created challenges in areas such as privacy breaches and compliance with accepted ethical norms. The increasing use of databases in public and private organizations in Iran has introduced new...
متن کاملRule-Based Normalization of Historical Texts
This paper deals with normalization of language data from Early New High German. We describe an unsupervised, rulebased approach which maps historical wordforms to modern wordforms. Rules are specified in the form of context-aware rewrite rules that apply to sequences of characters. They are derived from two aligned versions of the Luther bible and weighted according to their frequency. The eva...
متن کامل